Radim Hladík1, Pierre Benz2, Yann Renisio3
1 Institute of Philosophy, Czech Academy of Sciences
2 School of Library and Information Science, University of Montreal
3 Centre for Research on social InequalitieS, CNRS & Sciences Po
graph TD
subgraph Data Input
M1["M1: Document-Topic Matrix (Ndoc x Ntop)
(~800,000 x ~3,600)"]
end
subgraph Discipline-Level Processing
M2["M2: Discipline-Topic Portfolios (Ndisc x Ntop)
(42 x ~3,600)"]
M3["M3: Processed for Topic PCA (Ntop x Ndisc)
(~3,600 x 42)"]
M4["M4: Global PC Scores (Ntop x Npc)
(~3,600 x 42 PCs)"]
end
subgraph Individual-Level Processing
M5["M5: Author-Topic Portfolios (Nauth x Ntop)
(~100,000 x ~3,600)"]
end
subgraph Final Output
M6["M6: Authors' Global PC Coordinates (Nauth x Npc)
(~100,000 x 42 PCs)"]
end
M1 -- "Mean by discipline" --> M2
M2 -- "CLR-transform, scale, transpose" --> M3
M3 -- "PCA" --> M4
M1 -- "Mean by author" --> M5
M5 & M4 -- "M5 × M4" --> M6
style M1 fill:#ccff99,stroke:#333,stroke-width:2px
style M2 fill:#b0e0e6,stroke:#333,stroke-width:2px
style M3 fill:#b0e0e6,stroke:#333,stroke-width:2px
style M4 fill:#b0e0e6,stroke:#333,stroke-width:2px
style M5 fill:#e0b0e6,stroke:#333,stroke-width:2px
style M6 fill:#d8bfd8,stroke:#333,stroke-width:2px
graph TD
%% Define Nodes with simpler labels
A[M6: Authors' Global PC Coordinates] --> B{Subsetting by Discipline}
B --> C1[Subset 1]
B --> C_n[...]
B --> C42[Subset 42]
C1 --> PCA1(PCA on Subset 1)
C_n --> PCAn(...)
C42 --> PCA42(PCA on Subset 42)
PCA1 --> Topics1(Topic frequencies 1)
PCAn --> Topics_n(...)
PCA42 --> Topics42(Topic frequencies 42)
%% Styling (simplified for robustness)
classDef mainNode fill:#d8bfd8,stroke:#333;
classDef processNode fill:#e0f2f7,stroke:#333;
classDef subsetdNode fill:#fffacd,stroke:#333;
class A mainNode;
class B processNode;
class C1,C_n,C42 subsetNode;
class PCA1,PCAn,PCA42 processNode;
Inspired by Class Specific Analysis (CSA), but performed on coordinates of projected individuals.
Map of science. Source: (Hladík and Renisio 2025)
Disciplinary subsets and their principal components. Each panels shows the location of individuals affiliated with a discipline in the global map. Primary axes of local PCA are projected in the original coordinates (PC1 - red, PC2 - blue).
Sum of variance of local PCA axes for disciplines
Cosine of primary and secondary global/local principal components. Disciplines are ordered according to decreasing cosine similarity with PC1.
| negative_pc1 | positive_pc1 | negative_pc2 | positive_pc2 |
|---|---|---|---|
| mesenchymal msc mscs multipotent limbal | beetle carabidae carabid saproxylic germanica | habitus buprestidae hydrophilidae dermestidae anthaxia | biomass aboveground miscanthus bioenergy phalaris |
| tnf anti_inflammatory crp lps calprotectin | carpathians highland extinct foothill novohradské | monogenean monogenea cichlid teleostei dactylogyrus | phenol chlorinate reductive halogenate dehalogenation |
| isomer oxidize electron_transfer fullerene halogen | assemblage contemporaneous millennia olešnice vranovice | shedding | sativa sowing sorghum radish újezd |
| kinase mapk akt mtor mitogen | fauna faunistic new_record first_record palearctic | nile mosquito mosquitoe culex wnv | photosystem psii ppb thylakoid pcc |
| amine biogenic polyamine histamine tyramine | spider araneae harvestman scorpion zodarion | borrelia sensu lyme burgdorferi burgdorferus | laccase decolorization pleurotus anthraquinone ostreatus |
| catalyze halide nitro ortho imidazole | steppe phytosociological relevé swamp ruderal | oocyte cumulus gcs granulosa blastocyst | chlorophyll_fluorescence irradiance deplet acclimation naked |
| hydrazone ferritin chelator hepcidin transferrin | coleoptera palaearctic genitalia nepal socotra | lice chew myrsidea louse phthiraptera | reclamation coal_mining heap spoil sokolov |
| adenine guanine cytosine nucleobase oligonucleotides | vegetation landsat evergreen carr carlsbad | tapeworm cestoda cestode sucker scolex | spruce beech norway_spruce picea_aby spruce_stand |
| inhibitor inhibitors mgm equipotent irae | peat jihlava bog abandoned peatland | chrysomelidae curculionidae galerucinae scolytinae medvedev | carotenoid chl carotene lutein fractionate |
| ligand crystallography complexes ligands thioether | habitat biodiversity species_richness habitats species_diversity | tick ricinus ixodes nymph ixode | rice sinapis vicia pistachio vineyards |
Prompt: You will be provided with a table with four sets of topics representative of authors from four clusters in Biological Sciences.
Your task as an expert annotator with deep knowledge of Biological Sciences is to identify the overarching theme of each cluster based on the representative topics.
Output the following items (in English) that describe the topic of the cluster: ‘short label’ (at most 3 words and format in Title Case), ‘long label’ (at most 8 words and format in Title Case), list of 10 ‘keywords’ (ordered by relevance and format in Title Case), and ‘summary’ (few sentences).
Cluster 1 - Short Label: Molecular Cell Biology
Long Label: Molecular Biology, Cell Signaling, And Biochemistry
Summary: This cluster focuses on molecular and cellular biology, investigating biochemical processes, cell signaling pathways (like MAPK and AKT/mTOR), stem cell biology (MSCs), and inflammatory responses. Research includes enzyme catalysis, ligand interactions, and redox reactions at the cellular level.
Cluster 2 - Short Label: Ecology & Biodiversity
Long Label: Field Ecology, Biodiversity, Faunistics, And Biogeography
Summary: This cluster centers on field ecology, the study of biodiversity, and faunistics, particularly concerning insect (e.g., beetles) and arachnid communities. It involves research on species assemblages in various habitats like steppes, bogs, and forests, often with a biogeographical focus and including vegetation and paleoecological analysis.
Cluster 3 - Short Label: Parasitology & Disease Vectors
Long Label: Parasitology, Medical Entomology, And Vector-Borne Diseases
Summary: This cluster focuses on parasitology, medical and veterinary entomology, and the ecology of vector-borne diseases. Topics include the study of various parasites (e.g., tapeworms, monogeneans), disease vectors such as ticks, mosquitoes, and lice, and associated pathogens like Borrelia (Lyme disease) and West Nile Virus.
Cluster 4 - Short Label: Plant & Environmental Science
Long Label: Plant Science, Photosynthesis, Bioenergy, And Bioremediation
Summary: This cluster covers plant sciences with applications in bioenergy, agriculture, forestry, and environmental remediation. Research includes studies on photosynthesis, plant biomass production (e.g., Miscanthus), crop species, forest ecology, and the use of plants and microbes for the bioremediation of contaminated sites and land reclamation.
| negative_pc1 | positive_pc1 | negative_pc2 | positive_pc2 |
|---|---|---|---|
| religious religion religiosity secularization anthropologist | pedestrian traffic_accident crash lane roundabout | forensic criminalistic criminology investigative dangerousness | small_medium smes sme medium_size medium_sized |
| sexual habit sexuality arousal harassment | utility markowitz efore tvp | youth young_people ministries | investment invest investing fta investament |
| canon festival santa margaret saviour | supplier outsourcing vendor outsource edi | retention dilution ptf | bank banking financial_market liquidity banks |
| jewish jews holocaust shoah semitism | infrastructure smart_city bionic ergonomy infrastruktura | sexual habit sexuality arousal harassment | india indian mauritius burdensome leone |
| birth baby abortion childbirth birt | deposit rim | english proficiency receptive determiner unmotivated | budget budgeting budgets bep properte |
| phenomena sonic | intrusion alert attacker malware malicious | religious religion religiosity secularization anthropologist | commerce millennial b2c outreach millennials |
| rome francis bernard beran statutes | budget budgeting budgets bep properte | recording recordings webster | tax taxation income_tax vat tax_rate |
| kingdom royal emperor dynasty prince | investment invest investing fta investament | birth baby abortion childbirth birt | value_add profitability eva envelopment financial_performance |
| war diplomacy moscow defeat uprising | latent sit evaluations samplex | differentiation puma ovo diferenciace | fields |
| sport club sports olympic sporting | advertising advertisement advertise influencer neuromarketing | medical ebm funder ambulances k4care | macro cas dimen |
Prompt: You will be provided with a table with four sets of topics representative of authors from four clusters in Law.
Your task as an expert annotator with deep knowledge of Law is to identify the overarching theme of each cluster based on the representative topics.
Output the following items (in English) that describe the topic of the cluster: ‘short label’ (at most 3 words and format in Title Case), ‘long label’ (at most 8 words and format in Title Case), list of 10 ‘keywords’ (ordered by relevance and format in Title Case), and ‘summary’ (few sentences).
Cluster 1 - Short Label: Law, Religion & History
Long Label: Law, Religion, History, And Socio-Cultural Issues
Summary: This cluster explores the interplay between law and diverse societal elements including religion, historical events, and cultural practices. It covers topics like religious legal systems, historical governance, family law matters, international relations concerning conflict, and the legal aspects of sports and significant historical atrocities.
Cluster 2 - Short Label: Commercial & Tech Law
Long Label: Modern Commercial, Technology, And Financial Regulatory Law
Summary: This cluster focuses on contemporary legal areas dealing with commerce, technology, finance, and public safety. It includes business transactions, IT and cybersecurity regulations, financial and investment law, urban infrastructure development, advertising, traffic accident liability, and public budgeting.
Cluster 3 - Short Label: Criminal & Social Law
Long Label: Criminal Law, Family Law, And Socio-Medical Legal Issues
Summary: This cluster centers on legal fields addressing crime, social welfare, and medical contexts. Key areas include criminal justice and forensics, family law (including birth and sexual harassment), youth justice, the intersection of law and religion, and legal aspects of healthcare and medical practice.
Cluster 4 - Short Label: Business & Financial Law
Long Label: Corporate, Financial, Tax, And International Business Law
Summary: This cluster is concentrated on legal frameworks governing business operations, finance, and taxation, often with an international dimension. It covers areas such as small and medium-sized enterprises, banking and financial markets, investment strategies, international commerce (with focus on specific jurisdictions like India/Mauritius), and tax regulations.
This work was financially supported by the project OP JAK: Knowledge in the Age of Distrust, CZ.02.01.01/00/23_025/0008711.
Homology workshop, Centre Universitaire de Norvège à Paris (CUNP), Paris May 22–23, 2025